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This paper is meant as a contribution to the often debated subject of how to combine 
data which appear to be in mutual disagreement. As a practical example, the e'/e 
determinations have been considered. 



(Submitted to Physical Review D) 



^^ Universita 'La Sapienza' and Sezione INFN di Roma 1, Rome, Italy, and CERN, Geneva, Switzerland 
Email : dagostiniOromal . inf n . it 



URL: littp: //www-zeus .romal . inf n. it/~agostini 



1 Introduction 

Every physicist knows the rule for combining several experimental results: 




where '/x' refers to the true value and di ± Si stands for the individual data point (the use 
of Si, instead of the usual cxj, for the standard uncertainty reported by the experiments 
will become clear later; similarly, the meaning of '/i' and of o"(/i) have not been well 
defined for the moment, as they will be better defined later). The above rule, hereafter 
called standard combination rule, is based on some hypotheses which are worth recalling: 
i) all measurements refer to the same quantity; ii) the measurements are independent; 
Hi) the probability distribution of di around /i is described by a Gaussian distribution 
with standard deviation given by a^ = Si. If one, or several, of these hypotheses are not 
satisfied, the result of Eqs. (0)-(0) is questionable. 

Now we are confronted with the problem that we are never absolutely sure if these 
hypotheses are true or not. If we were absolutely convinced that the hypotheses were 
correct, there would be no reason to hesitate to apply Eqs. (^-(0), no matter how 'ap- 
parently incompatible' the data points might appear. But we know by experience that 
unrecognized sources of systematic errors might affect the results, or that the uncertainty 
associated with the recognized sources might be underestimated (but we also know that, 
often, this kind of uncertainty is prudently overstated. . . ). 

As is always the case in the domain of uncertainty, there is no 'objective' method 
for handling this problem; neither in deciding if the data are in mutual disagreement, 
nor in arriving at a universal solution for handling those cases which are judged to be 
troublesome. Only good sense gained by experience can provide some guidance. Therefore, 
all automatic 'prescriptions' should be considered cum grano salis. For example, the usual 
method for checking the hypothesis that 'the data are compatible with each other' is to 
make a x^ test. The hypothesis is accepted if, generally speaking, the x^ does not differ 
too much from the expected value. As a strict rule, the x^ test is not really logically 
grounded (see e.g. Section 1.8 of Ref. |1|) although it does 'often work', due to implicit 
hypotheses which are external to the standard x^ test scheme (see Section 8.7 of Ref. 0]), 
but which lead to mistaken conclusions when the unstated hypotheses are not reasonable 
(see e.g. Section 1.9 of Ref. |l[]). Therefore, I shall not attempt here to quantify the degree 
of suspicion. I shall assume a situation in which experienced physicists, faced with a set 
of results, tend to be uneasy about the mutual consistency of the picture that those data 
offer. 

As an example, let us consider the results of Table |l], which are also reported 
in a graphical form in Fig. |l|. Figure |^ shows also the combined result obtained using 
Eqs. (|l|)-(0), as well as some combinations of subsamples of the results. These results 
have not been chosen as the best example of disagreeing data, but because of the physics 
interest, and also because the situation is at the edge of where one starts worrying. The 
impression of uneasiness is not only because the mutual agreement among the experimen- 
tal results is not at the level one would have wished, but also because the value of Re(e'/e) 
around which the experimental results cluster is somewhat far from the theoretical eval- 
uations (see e.g. Refs. 0, |T^, 0, ^ |T3| and references therein). Now, it is clear that 



Table 1: Published results on Re(eYe) (values in units of 10~^). Data points indicated by \J 
have been used for quantitative evaluations. Owing to correlations between the 1988 and 1993 
uncertainties of NA31, only the combined value published in 1993 is used protect [p. 



Experiment 


Central value 


^^stat ± ^syst 


^tot 


y E731 (1988) § 


32 


±28 ± 12 


30 


NA31 (1988)1 


33 


±6.6 ±8.3 


11 


V E731 (1993) [| 


7.4 


±5.2 ±2.9 


5.9 


NA31 (1993) 1 


20 


±4.3 ± 5.0 


7 


V NA31 (1988+1993)1, 1 


23.0 


±4 ±5 


6.5 


V KTeV (1999) 1\ 


28.0 


±3.0 ±2.8 


4.1 


V NA48 (1999) § 


18.5 


±4.5 ±5.8 


7.3 
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Figure 1: Results on Re(eYe) obtained at CERN (solid line) and Fermilab (dashed line), where 
e = Re(e7e) x lOl 
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Figure 2: Some combinations of the experimental results obtained using the standard com- 
bination rule of Eqs. (||)-(^). Upper plot: old results (dashed line), 1999 results (solid line), 
overall combination (dotted grey line). Lower plot: CERN experiments (solid line), Fermilab 
experiments (dashed), overall combination (dotted grey line). 



experimentalists should not be biased towards theoretical expectations, and the history 
of physics teaches us about wrong results published to please theory. But we are also aware 
of unexpected results (either claims of new physics, or simply a quantitative disagreement 
with respect to the global scenario offered by other results within the framework of the 
Standard Model) which finally turn out to be false alarms. In conclusion, given the present 
picture of theory versus experiments about e'/e, there is plenty of room for doubt: Doubt 
about theory, about individual experiments, and about experiments as a whole. 

In this situation, drawing conclusions based on a blind application of Eqs. ([T|)-(^ 
seems a bit naive. For example, a straightforward conclusion of the standard combination 
rule leads to a probability that Re(e'/e) is smaller than zero of the order of 0.5 x 10~^^, 
and I don't think that experienced physicists would share without hesitation beliefs of 
this order of magnitude. 

This paper deals with modelling the beliefs of an experienced sceptical physicist 
confronted with results of this kind, continuing on from a recent work of Dose and von 
der Linden on outliers |T^. 



2 Hypotheses behind the simple combination rule 

Equation (|l|) has been written, on purpose, in a way that might be misleading, 
although this is the way in which it often appears. In fact, taken literally, it says that n 
is equal to the right-hand side of Eq. (|l|). Instead, as is well understood, this is just the 
value around which our beliefs are centred, usually referred to as the estimator. Given a 
Gaussian model, the estimator given by Eq. (|l]) corresponds to the value which we believe 
mostly (mode), and also to the barycentre of the probability distribution^ of /i {expected 
value) and to the value which defines two semi-open intervals in each of which we believe 
/i to lie with equal probability {median). 

In order to obtain a combination rule different from Eqs. (|I|)-(^, it is important to 
remember where these formulae come from. Although this rule is usually taught in the 
framework of maximum likelihood, the most general way to get it is by using Bayesian 
inference, as we shall show now. 

The simplest way to write Bayes' theorem for continuous variables is: 

/(/i|c[) oc/(rf|/i)-/o(/i), (3) 

where the set of data points {di,d2, . . . ,dn} is indicated by d; the function f{^\d) is 
the final probability density function (p.d.f.) of yU in the light of the experimental results 
and of all other prior knowledge about measurement and measurand; f{d\ fi) represents 
the likelihood of observing the data set d under the hypothesis that the true value is 
exactly fj,; /o(/i) is the prior p.d.f. of /i. The proportionality factor is obtained by the 
normalization condition //(/i | ^) d/i = 1. The assumption that each of the observed values 
is normally distributed around fi with standard deviation cxj and that the measurements 
are independent leads to 

{d.-fi)'' 



f{d I /^) = n A^ ^^p 



2al 



(4) 



If the experimental resolution described by the likelihood is sufficiently high and fi is 
a quantity which can assume, in principle, values in a large interval (virtually any real 
values), a uniform prior distribution, i.e. /o(/i) = k, is a. very reasonable assumption. In 
fact, any other mathematical function which models the vagueness of the prior knowledge 
(with respect to what the measurement is supposed to yield) acts in practice as a constant 
in the region of /i where the likelihood varies rapidly. Putting all the ingredients together 
and renormalizing the final p.d.f. we get 



/(/i I d, indep. Gaussians, a, /o(/i) = k) = —= exp 



'27T a{jj,) 



(/i-E[/i])- 
2a\f,) 



(5) 



where 



E\u\ = ^' ^'^^' f6) 

^{^) = lEv^n , (7) 



Following physics intuition, we consider it natural to talk about probability of true values. For his- 
torical reasons, this point of view is currently known by the somewhat esoteric name of Bayesian, 
to distinguish it from the so-called frequentistic point of view, according to which the category of 
probable should not be applied to true values and, generally speaking, to hypotheses. For a physicist's 
introduction to Bayesian reasoning see Ref. |l|, or Ref. [n5| for a short account. 



obtained assuming that the cxj of Eq. (^) are exactly equal to the quoted stated uncer- 
tainties Si. In Eq. @ all conditions have been explicitly stated. This derivation shows 
that there is indeed a fourth important implicit assumption in order to arrive at Eqs. 
(j^)-(0), namely a uniform prior^ on fi. This is why the maximum belief coincides with 
the maximum of the likelihood, and why the best estimate of /x is the same as is ob- 
tainable from the maximum likelihood principle. Nevertheless, the route followed here is 
more general and more intuitive, as discussed extensively in Ref. |l[|. In particular, one 
can speak consistently about probability of true values, a concept close to the natural 
reasoning of physicists [l^ . 



3 Probabilistic modelling of scepticism 

Once we have understood what is behind the simple combination rule, it is possible 
to change one of the hypotheses entering Eq. (j^). Obviously, the problem has no unique 
solution. This depends to a great extent on the status of knowledge about the experiments 
which provided the results. For example, if one has formed a personal idea concerning the 
degree of reliability of the different experimental teams, one can attribute different weights 
to different results, or even disregard results considered unreliable or obsolete (for example 
their corrections for systematic effects could depend on theoretical inputs which are now 
considered to be obsolete). Wishing to arrive at a solution which, with all the imaginable 
limitations a general solution may have, is applicable to many situations without an inside, 
detailed knowledge of each individual experiment, we have to make some choices. First, 
we decide that our sceptic is democratic, i.e. 'he' has no a priori preference for a particular 
experiment. Second, the easiest way of modelling his scepticism, keeping the mathematics 
simple, is to consider the likelihood still Gaussian, but with a standard deviation which 
might differ from that quoted by the experimentalists by a factor r^ which is not exactly 
known: 

r. = ^ . (8) 

Si 

The uncertainty about r^ can be described by a p.d.f. f{ri). This uncertainty changes each 
factor appearing in the likelihood (Q), as can be evaluated by the probability rules: 

f{di\fi)= f{di\fi,ri) ■ f{ri)dri, (9) 



with 



f{di I /i, ri, Si) = — == exp 



{di - fi) 
2r?s^ 



21 



(10) 



If one believes that all r^ are exactly one, i.e. /(rj) = (5(rj — 1) Vi, the standard combination 
rule is recovered. Because of our basic assumption of democracy, the mathematical expres- 
sion of the p.d.f. of Ti will not depend on i, therefore we shall talk hereafter, generically, 
about r and /(r). 



^^ For those used to frequentistic methods, in which 'there are no priors', I would hke to recaU how 
Gauss [|l6| derived his famous Gaussian distribution describing experimental errors. He made explicit 
use of the concepts of prior and posterior probability of hypotheses, and derived a formula equivalent 
to Bayes' theorem valid for a priori equiprobable hypotheses (condition explicitly stated). Then, using 
some symmetry arguments, plus the condition that the final distribution is maximized when the true 
value of the quantity equals the arithmetic average of the measurements, he obtained the functional 
form of the error distribution (playing the role of likelihood) , which is now named after him. 



A solution to the problem of finding a parametrization of /(r) such that this p.d.f. is 
acceptable to experienced physicists, even though the integral (P) still has a closed form, 
has been proposed by Dose and von der Linden |]I^ ; an improved version of it will be used 
in this paper [|18|. Following Ref. [|1^, we choose initially the variable uj = 1/r^ = sf/af, 



and consider it to be described by a gamma distribution: 

\<5 ,(5—1 „— Aoj 

where A and 6 are the so-called scale and shape parameters, respectively. As a function of 
these two parameters, expected value and variance of cj are E[uj] = 6/X and Var(a;) = 5/A^. 
Using probability calculus we get the p.d.f of r: 

2 \<5^-(2<5+l) -A/r2 

f{r\K6) = '-^^^-^^ , (12) 

where the parameters have been written explicitly as conditionands for the probability 
distribution. Expected value and variance of r are: 

E|,.| . ^^^ (13, 

existing simultaneously if A > and 6 > 1. 

The individual likelihood, integrated over the possible values of r, is obtained by 
inserting Eqs. (0) and {^ in Eq. (1): 






Using a uniform prior distribution for /x, and remembering that we are dealing with 
independent results, we have finally: 



/(/i I ^, s) oc f{d I s, /i) oc TT I A + 



{d^-f^n-''^'^'' 



2 s] 



(16) 



where s = {si, S2, . . . , s„}. The normalization factor can be determined numerically. Equa- 
tion ([T6|) should be written, more properly, as /(/i | rf, s. A, 5), to remind us that the solution 
depends on the choice of A and 5, and teaches us how to get a solution which takes into 
account all reasonable choices of the parameters: 

f{fi\d,s) = jf{fi\d,s,\,S)-f{\,S)d\dS, (17) 

where /(A, 5) quantifies the confidence on each possible pair of parameters.^ 

A natural constraint on the values of the parameters comes from the request 
E[r] = 1, modelling the assumption that the a's agree, on average, with the stated 



^' A and 5 are the same for all experiments as we are modelling a democratic scepticism. In general they 
could depend on the experiment, thus changing Eq. (|lq). 




Figure 3: Distribution of the rescaling factor r = o'true/'^est using the paranietrizations of 
Eq. (^) for several values of the set of parameters (A, 6); the solid line corresponds to what will 
be taken as the reference distribution in this paper, yielding E[r] = cr{r) = 1, and it is obtained 
for A ss 0.6 and 6 ss 1.3. Dotted and dashed lines show the p.d.f. of r yielding a{r) = 0.5 and 
1.5, respectively. 



uncertainties. The standard deviation of the distribution gives another constraint. Conser- 
vative considerations suggest (T(r)/E[r] ^ 0(1). The condition E[r] = a{r) = 1 is obtained 
for A ~ 0.6 and 6 ~ 1.3. The resulting p.d.f. of r is shown as the continuous line of Fig. |^. 
One can see that the parametrization of /(r) corresponds qualitatively to intuition: the 
barycentre of the distribution is 1; values below r ^ 1/2 are considered practically impos- 
sible; on the other hand, very large values of r are conceivable, although with very small 
probability, indicating that large overlooked systematic errors might occur. Anyway, we 
feel that, besides general arguments and considerations about the shape of /(r) (to which 
we are not used), what matters is how reasonable the results look. Therefore, the method 
has been tested with simulated data, shown in the left plots of Fig. ^. 

For simplicity, all individual results are taken to have the same standard deviation 
(note that the upper left plot of Fig. ^ shows the situation of two identical results). The 
solid curve of the right-hand plots shows the combined result obtained using Eq. ([16D 
with A = 0.6 and 6 = 1.3, yielding E[r] = a{r) = 1. For comparison, the dashed lines 
show also the result obtained by the standard combination. The method described in this 
paper, with parameters chosen by general considerations, tends to behave in qualitative 
agreement with the expected point of view of a sceptical experienced physicist. As soon 
as the individual results start to disagree, the combined distribution gets broader than 
the standard combination, and might become multi-modal if the results cluster in several 
places. However, if the agreement is somehow 'too good' (first and last case of Fig. §) the 
combined distribution becomes narrower than the standard result. 

In order to get a feeling about the sensitivity of the results from the choice of 
the parameters, two other sets of parameters have been tried, keeping the requirement 
E[r] = 1, but varying a{r) by ±50%: a{r) = 0.5 is obtained for A ~ 1.4 and 6 ~ 2.1; 
a{r) = 1.5 is obtained for A ~ 0.4 and 6 ~ 1.1. The resulting p.d.f.'s of r are shown in 
Fig. ^. The results obtained using these two sets of parameters on the simulated data of 
Fig. ^ are shown in Fig. |^. We see that, indeed, the choice E[r] = a{r) = 1 seems to be 
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Figure 4: Examples of sceptical combination of results. The plots on the left-hand side show 
the individual results (in the upper plot the two results coincide). The plots on the right-hand 
side show the combined result obtained using Eq. (|lq) with the constraint E[r] = a{r) = 1 
(continuous lines), compared with the standard combination (dashed lines). 



Eq. (|T6D, A = 1.4 and 6 = 2.1 
[a{r) = 0.5] 



Eq. dlD, A = 0.4 and S = 1.1 
[a{r) = 1.5] 




Figure 5: Combination of results obtained by varying the parameters of the sceptical combina- 
tion. 
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Figure 6: Sceptical perception of a single measurement having a standard deviation equivalent to 
the standard combination of the top of Fig. ^. Note how the result differs from the combination 
of the individual results. 

an optimum, and the ±50% variations of a{r) give results which are at the edge of what 
one would consider to be acceptable. Therefore, we shall take the parameters providing 
E[r] = cr(r) = 1 as the reference ones. 

Another interesting feature of Eq. (|TB|) is its behaviour for a single experimental 
result, as shown in Fig. |^. For comparison, we have taken a result having a stated standard 
deviation equal to 1/v^ of each of those of Fig. ^. Figure || has to be compared with the 
upper right plots of Fig. |^. The sceptical combination takes much more seriously two 
independent experiments, each reporting in an uncertainty a, than a single experiment 
performing cr/v2. On the contrary, the two situations are absolutely equivalent in the 
standard combination rule. In particular, the tails of the p.d.f. obtained by the sceptical 
combination vanish more slowly than in the Gaussian case, while the belief in the central 
value is higher. The result models the qualitative attitude of sceptical physicists, according 
to whom a single experiment is never enough to establish a value, no matter how precise 
the result may be, although the true value might have more chance to be within one 
standard deviation than the probability level calculated from a Gaussian distribution. 

4 Application to e'/e 

The combination rule based on Eq. (|T^) has been applied to the results about 
Re(e'/e) shown in Table |I]. As discussed above, our reference parameters are A = 0.6 
and 6 = 1.3, corresponding to E[r] ^ cr(r) ^ 1. The resulting p.d.f. for e = Re(e'/e) x 
10^ is shown as the thick continuous line of Fig. |^ together with the individual results 
(dotted lines). For comparison, we also give the result obtained using the combination 
rules commonly applied in particle physics. The grey-dashed line of Fig. |^ is obtained 
with the standard combination rule [Eqs. ([l|) and (0)]. The thin continuous line has been 
evaluated using the Particle Data Group (PDG) 'prescription ' |p!9|. Accor ding to this rule, 
the standard deviation (||) is enlarged by a factor given by a/x^/(A^ — 1), where x^ is the 
chi-2 of the data with respect to the average (^ and A^ is the number of independent 
result S.0 

We see that although the PDG rule gives a distribution wider than that obtained 



^^ Note that the 'ofhcial' world average obtained using the PDG recipe of (21.2 ± 0.46 x 10~^ (see e.g. 
[ [lO[ [ll| , [l2| ) differs from that given here because aU five results of Table 1 are used here, as I do not 
see any reason why the 1988 E731 result should be disregarded. 
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Figure 7: Individual results compared with the standard combination (grey dashed), the PDG- 
rescaled combination (solid thin) and the sceptical combination as described in this paper (solid 
thick). 



by the standard rule, the barycentres of the distributions coincide, thus not taking into 
account that one of the results is quite far from where the others seem to cluster. Moreover, 
the p.d.f. is assumed to be Gaussian, independently of the configuration of experimental 
points. Instead, the sceptical combination takes into account better the configuration of 
the data points. The peak of the distribution is essentially determined by the three results 
which appear more consistent with each other. Nevertheless, there is a more pronounced 
tail for small values of Re(e'/e), to take into account that there is indeed a result providing 
evidence in that region, and that cannot be ignored. 

A quantitative comparison of the different methods is given in Table §, where the 
most relevant statistical summaries are provided (average, mode, median, standard devi- 
ation), together with some probability intervals. It is worth recalling that each of these 
summaries gives some information about the distribution, but, when the uncertainty of 



Table 2: Comparison of the different methods of combining the results. 



Combination 


Mean (a) 


Median 


Mode 
± 34% range 


99% range 


P[Re(e7e < 0)] 


Standard 


21.4(2.7) 


21.4 


21.4 ±2.7 


[14.3,28.5] 


5 X 10-^5 


PDG rule [|19[ 


21.4(4.0) 


21.4 


21.4 ±4.0 


[11.0,31.7] 


5 X 10-s 


Sceptical 


22.7 (3.5) 


23.0 


23.5 ±3.4 


[11.6,30.5] 


1.5 X 10~^ 
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Figure 8: Dependence of the sceptical combination on the choice of the parameters. Continuous, 
dotted and dashed hues are, in order: A = 0.6 and b = 1.3 [cr(r) = 1)]; A = 0.4 and 5 = 1.1 
[(7(r) = 0.5)]; A = 1.4 and 5 = 2.1 [(T{r) = 0.5)]. The grey-dashed line gives, for comparison, the 
result of the standard combination. 



this result has to be finally propagated into other results (together with other uncertain- 
ties), it is the average and standard deviation which matter .0 An interesting comparison 
is given by the probability that Re(e'/e) is negative. The sceptical combination gives the 
largest value, but still at the level of one part per million, indicating that, even in this con- 
servative analysis, a positive value of the direct CP violation parameter seems 'practically' 
established. 

The sensitivity of the result on the parameters of the combination formula can 
be inferred from Fig. §, where the results obtained changing cr(r) by ±50% are shown. 
The combined result is quite stable. This is particularly true if one remembers that these 
extreme values of parameters are quite at the edge of what one would accept as reasonable, 
as can be seen in Fig. ^ Note that if one would like to combine the results taking also into 
account the uncertainty about the parameters, one would apply Eq. (|1^ . It is reasonable 
to think that, since the variations of the p.d.f. from that obtained for the reference value 
of the parameters are not very large, the p.d.f. obtained as weighted average over all the 
possibilities will not be much different from the reference one. 

Figure |^ and Table ^ give the results subdivided into CERN and Fermilab. In these 
cases the difference between the standard combination and the sceptical combination 
becomes larger, and, again, the outcome of the sceptical combination follows qualitatively 
the intuitive one of experienced physicists. The sceptical combination of the CERN results 
alone is better than that given by the standard one, thus reproducing formally the 



^' The standard 'error propagation' is based on linearization, on the property of expected value and 
variance under a linear combination and on central limit theory (the result of several contributions 
will be roughly Gaussian). Therefore, propagating mode (or median) and 68% probability intervals 
does not make any sense, unless the input distributions are Gaussian. 



13 



0.14 






0.12 






0.1 


/-^ 




0.08 


/ \ 




0.06 
0.04 
0.02 


7 V 


e 




10 20 30 


40 


0.14 






0.12 
0.1 


/ N 

/ \ 




0.08 
0.06 
0.04 


' ■•/ '■■■\ 
' •■■■ / ■■\ 

/ .•■■ / \\ 




0.02 




'■^~— ,^ e 


__^.,,<<^ ^ V, 




10 20 30 


40 



Figure 9: Sceptical combination of CERN and Fermilab results (upper and lower plot, respec- 
tively). The continuous line shows the result obtained by Eq. (|l6| ) and reference parameters. The 
dashed and dotted lines are the results obtained by varying the standard deviation of r = a/s by 
+50% and —50%, respectively. The grey-dashed line shows the results obtained by the standard 
combination rule. 



Table 3: Comparison of the different methods of combining partial results. The symbol * means 
that the distribution has less than 34.1% probability on the right side of the mode. 



Combination 


Mean (a) 


Median 


Mode 


99% p. range 


P[Re(e7e) < 0] 










±34% p. range 






Stand. 


r CERN 


21.1(4.8) 


21.1 


21.1 ±4.8 


[8.6, 33.4] 


6 X 10-6 


1 Fermilab 


21.4(2.7) 


21.4 


21.4 ±2.7 


[12.9, 30.1] 


8 X 10-^^ 


Scept. 


r CERN 


21.0(3.9) 


21.0 


21.1 ±3.6 


[9.2, 32.5] 


2.5 X 10-^ 


1 Fermilab 


23.0(7.1) 


25.2 


97 1 + * 


[2.7, 36.2] 


1.5 X 10-3 
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instinctive suspicion that the uncertainties could have been overestimated. For the Fermi- 
lab ones the situation is reversed. In any case, both partial combinations tend to establish 
strongly the picture of a positive and sizeable Re(e'/e) value. Finally, note that the ±50% 
variations in a{r) produce in the partial combinations a larger effect (although not rele- 
vant for the conclusions) than in the overall combination. This is due to the fact that the 
variations produce opposite effects on the two subsets of data in the region of Re(e'/e) 
around 20 x 10~^. 

5 Posterior evaluation of ai 

An interesting by-product of the method illustrated above is the posterior evaluation 
of the various a^, or, equivalently, of the various Vi. Again, we can make use of Bayes' 
theorem, obtaining 



Jf{d\r,s,fx) ■ /o(r |s,/i)dr ' 

where r = {ri,r2, . . . ,t„}. Since the initial status of knowledge is such that values of r, 
are independent of each other, and they are independent of /i and s, we obtain 

i i i ^ ' 

having used Eq. (|T^. As a shorthand for Eq. (|T^, we shall write in the following simply 

/o(z:) = n Jo(r.). 

Since the experimental results are also considered independent, we can rewrite 
Eq. (|1|) as 

f{r\d,s,^) - n.mk.../.)-/o(r.) 



IIlifidi\ri,Si,fi) ■ fo{ri)dr 

Uifidi\ri,Si,fi) ■ foirj) 
]\ijf{di\ri,si,^i) ■ fo{ri)dri 



(20) 



The marginal distribution of each Tj, still conditioned by /i (and, obviously, by the exper- 
imental values), is obtained by integrating f{r\d, s, fi) over all r^, with j 7^ i. As a result, 
we obtain 

f(r\d s ) = f^di\ri,Si,^j)- fo{ri) 

Jf{di\ri,Si,n) ■ fo{ri)dri 



Making use of Eqs. (|TOD, (|T2|) and (0) we get: 



^ exp 



/2 IT r-i Si 



jdj-fJ.) 



r(<5) 



2 



f{ri\d,s,fi) = — ^ .,_rA,i/9^ — • (22) 



The final result is obtained by eliminating, in the usual way, the condition fi, i.e. 

f{ri \d,s) = / f{ri \d,s,^) ■ f{n\ d, s) d/i . (23) 



15 



Making use of Eq. (|T6[) , and neglecting in Eq. 
we get the unnormalized result 



2|) all factors not depending on Vi and /i, 



/(^ 



rf,s) ocr: 



(2 5+2) A/r? 



exp 



2r2s2 



n 



A + 






-(5+1/2) 



d/i . (24) 



This formula is clearly valid for n > 2. If this is not the case, the product over j ^ i 
is replaced by unity, and the integral is proportional to rj. Equation ( plj) becomes then 
/(ri I di,Si) oc r^ e~'*'/''i, i.e. we have recovered the initial distribution (p^). In fact, 

if we have only one data point, there is no reason to change our beliefs about r. Only the 
comparison with other results can induce us to change our opinion. 

Once we have got /(r^ | d, s) we can give posterior estimates of Vi in terms of average 
and standard deviations, and they can be compared with the prior assumption E[r] = 
a{r) = 1, to understand which uncertainties have been implicitly rescaled by the sceptical 
combination.^ Convenient formulae to evaluate numerically first and second moments of 
the posterior distribution of Tj are given byQ 



E\r, 



Elr 



T{6) 



/(A + ^%#'''' 



n,(A + ^ 



"(5+1/2) 



d/i 



T{6 + 1/2) 

r(^-i/2) /(a + 

T{6+ 1/2) 



/n,iA+ ,. 



(d^-M)^ 

2s'f 



(.,-,) A -(^+V2) 



UA^+ 2s^ 



d/i 



(,^,_^)2X -(5+1/2) 



d/i 



/n, (a + (%# 



-(5+1/2) 



(25) 



(26) 



d/i 



At this point it is important to anticipate the objection of those who think that it is 
incorrect to infer n + 1 quantities (/i and r) starting from n data points. Indeed, there 
is nothing wrong in doing so. But, obviously, the results are correlated, and they depend 
also on the prior distribution of Tj, which acts as a constraint. In fact we have seen above 
that for n = 1 the result on r is trivial. 

Figure |10] gives the final distributions of r^ = ai/si for the four most precise deter- 
minations of Re(e'/e) (the 1988 E731 result has not been plotted because it is very similar 
to the NA31 one, as one can understand from Table ^, compared with the reference ini- 
tial distribution having a{r) = 1 (grey line in the plot). The distributions relative to the 
CERN results are shown with continuous lines, the Fermilab ones by dots. In particular, 
the one that has a substantial probability mass above 1 is the 1993 E731 result. Average 



^^ Note that it is incorrect to feed again into the procedure the rescaled uncertainties, as they come 
from this analysis. The procedure has already taken into account all possible rescaling factors in the 
evaluation of f{fi\ d,s). 

^^ Note that, since YIA- ■ ■) of the integrands are proportional to /(^ \d,s), Eqs. (25)-(26) can be written 
in the compact form 



Eh] 



E\r] = 



T{S) 



r{S + 1/2) ' ^'' 

ns-i/2) 

TiS + 1/2) 



E, 



A + 



(d,-M) 


2\ 1/2" 


2.f 


) \ 


2 s.? J 


J 



where Ep[-] indicates expected values over the p.d.f. of /i. 
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5 



Figure 10: Final distributions of r corresponding to the four most precise results on Re(e'/e), 
compared with the reference prior one (grey line), i.e. having E[rj] = cr{ri) = 1 Vi. The continuous 
lines refer to the CERN results, dotted lines to the Fermilab ones. 

and standard deviations of the distributions are given in Table ^, also showing the values 
that one would obtain with the other sets of parameters that we have considered to be 
edge ones. 

Once more, the results are in qualitative agreement with intuition: The CERN 
curves are slightly squeezed below r = 1, as the uncertainty evaluation seems to be a bit 
conservative. The Fermilab ones show instead some drift towards large r. In particular, 
figure and table make one suspect that some contribution to the error has been overlooked 
in the E731 data. Note that in this case the average value of the rescaling factor is smaller 
than one could expect from alternative procedures which require the overall x^ to equal 
the number of degrees of freedom. The reason is the shape of the initial distribution of r, 
which protects us against unexpectedly large values of the rescaling factors. 



Table 4: Posterior estimation of r = ai/si starting from identical priors having Eo[r] = 1 and 
o'oir) = 0.5, 1.0 and 1.5. The individual results are given by di ± Sj to be consistent with the 
notation used throughout this paper. 



Experiment 


di Si 


Posterior E[rj] (cr(rj)) 
ao(r) = 0.5 (To(r) = 1 (To(r) = 1.5 


E731 (1988) 
E731 (1993) i 
NA31 (1988+1993)01 
KTeV (1999) i 
NA48 (1999) g 


32 30 

7.4 5.9 
23.0 6.5 
28.0 4.1 
18.5 7.3 


0.9 (0.4) 
1.6 (0.7) 
0.9 (0.4) 
1.2 (0.6) 
0.9 (0.4) 


0.8 (0.5) 
1.9 (1.2) 
0.8 (0.5) 
1.2 (0.9) 
0.9 (0.5) 


0.7 (0.5) 

2.1 (1.5) 
0.8 (0.6) 

1.2 (1.0) 
0.9 (0.6) 
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Figure 11: Combined result on Re(e'/e) compared with recent and very new theoretical calcu- 
lations (see text). 



6 Discussion and conclusions 

The problem of combining data which appear in mutual disagreement has been 
analysed from a probabilistic perspective. We have started from the usual hypotheses on 
which the well-known combination rule is based and we have seen that a possible solution 
can be based on a suitable modelling of the uncertainty on the standard deviation which 
describes the Gaussian likelihood. The complete status of uncertainty on the true value 
resulting from the various pieces of information is quantified by a p.d.f. f{fi) which, in 
our approach, does not have an a priori defined shape. This property allows one to obtain 
results which never conflict with the intuitive judgement of experienced physicists. The 
method described here also allows one to infer the ratio between the 'true' standard 
deviation and the stated one, as a result of the mutual agreement of the data. 

The application of this method to CP violation results from K° -^ 27r shows that 
the picture of a positive and sizeable value of Re(e'/e) survives a sceptical analysis. This 
conclusion also holds if one considers separately CERN and Fermilab results. As far as 
a number to summarize the result is concerned, the mass of probability is concentrated 
around 23.5 x 10~^, with a ±3.4 x 10~^ interval having a 68% probability of containing 
the true value. However the p.d.f. has a negative skewness that cannot be ignored. As a 
consequence, the expected value is slightly below the mode, at 22.7 x 10~^. We would like 
to re-state that what matters for uncertainty propagation is the expected value, together 
with the standard deviation (3.5 x 10""^), and not the mode, or the median, and the ±34% 
probability interval around either of them. 

The 1999 experimental results on Re(e'/e) have indeed renewed the interest of the- 



orists in the subject. The comparison of the combined result with recent I^O], ^ Q and 
very new |2^, 0, ^ ^ |12[ theoretical evaluations is given in Fig. ^ an extension of the 
updated version [|2^ of Fig. 2 of Ref. |TT[. The vertical bands quantify somehow the un- 
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Figure 12: Probability density functions resulting from the combined experimental information 



about Re(e'/e) compared with 1999 theoretical evaluations by the Munich p3|] and the Rome |12] 
groups (the Rome NDR evaluation is very similar to the HV one). 

certainty stated by the theoretical teams. The dark-grey bars should have the meaning of 
68% central probabihty bands, although sometimes they are given as standard deviation 
of a non-Gaussian distribution. The grey bars are obtained using a procedure that the 
theorists call 'scanning' (see original papers), but which has no well-defined probabilistic 
meaning. Since scanning produces very pessimistic uncertainty intervals, covering values 
of Re(e'/e) which the authors hardly believe, one should be careful about concluding from 
Fig. |Tl| that the experimental value of Re(e'/e) is well compatible with all the approaches 
used to evaluate it. For example. Fig. [l^, which shows the p.d.f.'s of the Munich and Rome 
teams, alongside that obtained from the combined analysis of the experimental results, 
gives a better idea of the mutual compatibility, and of how to interpret the grey bars 
of Fig. |TI| (note, in particular the positive skewness of the theory curves and negative 
skewness of the experiment curve). The grey-dashed bar shows the upper 2 a tail of the 
result of a recent evaluation |^ which gives a very large negative value, having also a 
large uncertainty. 

In conclusion, it seems that, given the well-known difficulties both in the experimen- 
tal determination and in the theoretical evaluation, the overall picture is not dramatically 
worrying (and therefore invoking new phenomenology seems premature). What it is prac- 
tically certain is that direct CP violation in the neutral-kaon system is established. We 
are all looking forward to an accurate theoretical explanation of the effect. 
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